Design of Vietnamese Speech Corpus and Current Status
نویسندگان
چکیده
This paper presents a current status and activities for spoken language resources for Vietnamese implemented in research institutions such as Institute of Information Technology, Vietnamese Academy of Science and Technology, and International Research Center MICA, Hanoi University of Technology. This is our first attempt of a process of building a large Vietnamese speech database and the corpora should be in a common design to make it available for researchers in Vietnamese speech processing.
منابع مشابه
Fujisaki model based F0 contours in vietnamese TTS
The current paper presents preliminary work towards the integration of the Fujisaki model into the VnVoice Vietnamese TTS system, based on a set of rules to control the F0 contour. A speech corpus consisting of 20 sentences was compiled. Each of the sentences can have various meanings depending on the tone associated with a monosyllabic keyword which it contains. The corpus with a total of 46 s...
متن کاملStatistical Analysis of Vietnamese Dialect Corpus and Dialect Identification Experiments
The performance of speech recognition systems will be improved if the corpus is organized in the specialized domain and is applied in a consistent way for speech recognition in specific situations. Vietnamese dialects are various. The building of corpus for Vietnamese dialect is the first step for implementing the system of dialect identification used for increasing the performance of Vietnames...
متن کاملOptimization on Vietnamese large vocabulary speech recognition
This paper summarizes our latest efforts toward a large vocabulary speech recognition system for Vietnamese. We describe the Vietnamese text and speech database which we collected as part of our GlobalPhone corpus. Based on these data we improve our initial Vietnamese recognition system [1] by applying various state-of-the art techniques such as semi-tied covariance and discriminative training....
متن کاملFirst steps in building a large vocabulary continuous speech recognition system for Vietnamese
This paper presents an overview of our activities for building a Large Vocabulary Continuous Speech Recognition (LVCSR) system for Vietnamese implemented at CLIPS-IMAG Laboratory (France) and International Research Center MICA (Vietnam). Firstly, a new methodology for fast text corpora acquisition for minority languages which has been applied to Vietnamese is proposed. Secondly, the first resul...
متن کاملTowards a Multi-Objective Corpus for Vietnamese Language
Today, corpus plays an important role in development and evaluation language and speech technologies, such as part of speech tagging, parsing, word sense disambiguation, text categorization, named entity classification, information extraction, question answering, structure discovery (clustering), speech recognition and machine translation systems, etc. One can exploit valuable statistical param...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006